Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology

نویسندگان

  • Adam C. English
  • Stephen Richards
  • Yi Han
  • Min Wang
  • Vanesa Vee
  • Jiaxin Qu
  • Xiang Qin
  • Donna M. Muzny
  • Jeffrey G. Reid
  • Kim C. Worley
  • Richard A. Gibbs
چکیده

Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best draft genomes contain gaps and other imperfections due to limitations in the input data and the techniques used to build draft assemblies. Sequencing biases, repetitive genomic features, genomic polymorphism, and other complicating factors all come together to make some regions difficult or impossible to assemble. Traditionally, draft genomes were upgraded to "phase 3 finished" status using time-consuming and expensive Sanger-based manual finishing processes. For more facile assembly and automated finishing of draft genomes, we present here an automated approach to finishing using long-reads from the Pacific Biosciences RS (PacBio) platform. Our algorithm and associated software tool, PBJelly, (publicly available at https://sourceforge.net/projects/pb-jelly/) automates the finishing process using long sequence reads in a reference-guided assembly process. PBJelly also provides "lift-over" co-ordinate tables to easily port existing annotations to the upgraded assembly. Using PBJelly and long PacBio reads, we upgraded the draft genome sequences of a simulated Drosophila melanogaster, the version 2 draft Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the Sooty mangabey. With 24× mapped coverage of PacBio long-reads, we addressed 99% of gaps and were able to close 69% and improve 12% of all gaps in D. pseudoobscura. With 4× mapped coverage of PacBio long-reads we saw reads address 63% of gaps in our budgerigar assembly, of which 32% were closed and 63% improved. With 6.8× mapped coverage of mangabey PacBio long-reads we addressed 97% of gaps and closed 66% of addressed gaps and improved 19%. The accuracy of gap closure was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reconstructing complex regions of genomes using long-read sequencing technology.

Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, l...

متن کامل

Direct sequencing of small genomes on the Pacific Biosciences RS without library preparation.

We have developed a sequencing method on the Pacific Biosciences RS sequencer (the PacBio) for small DNA molecules that avoids the need for a standard library preparation. To date this approach has been applied toward sequencing single-stranded and double-stranded viral genomes, bacterial plasmids, plasmid vector models for DNA-modification analysis, and linear DNA fragments covering an entire ...

متن کامل

Microsatellite marker discovery using single molecule real-time circular consensus sequencing on the Pacific Biosciences RS.

Microsatellite sequences are important markers for population genetics studies. In the past, the development of adequate microsatellite primers has been cumbersome. However with the advent of next-generation sequencing technologies, marker identification in genomes of non-model species has been greatly simplified. Here we describe microsatellite discovery on a Pacific Biosciences single molecul...

متن کامل

Scaffolding of a bacterial genome using MinION nanopore sequencing

Second generation sequencing has revolutionized genomic studies. However, most genomes contain repeated DNA elements that are longer than the read lengths achievable with typical sequencers, so the genomic order of several generated contigs cannot be easily resolved. A new generation of sequencers offering substantially longer reads is emerging, notably the Pacific Biosciences (PacBio) RS II sy...

متن کامل

Genetic Adaptation of Porcine Circovirus Type 1 to Cultured Porcine Kidney Cells Revealed by Single-Molecule Long-Read Sequencing Technology

Porcine circovirus type 1 (PCV1) is a nonpathogenic circovirus, and a contaminant of the porcine kidney (PK-15) cell line. We present the complete and annotated genome sequence of strain Szeged of PCV1, determined by Pacific Biosciences RSII long-read sequencing platform.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012